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The GCC free compiler is a very large software, compiling source in several languages for many 
targets on various systems. It can be extended by plugins, which may take advantage of its power 
to provide extra specific functionality (warnings, optimizations, source refactoring or navigation) 
by processing various GCC internal representations (Gimple, Tree, ...). Writing plugins in C is a 
complex and time-consuming task, but customizing GCC by using an existing scripting language 
inside is impractical. We describe MELT, a specific Lisp-like DSL which fits well into existing 
GCC technology and offers high-level features (functional, object or reflexive programming, pattern 
matching). MELT is translated to C fitted for GCC internals and provides various features to facilitate 
this. This work shows that even huge, legacy, software can be a posteriori extended by specifically 
tailored and translated high-level DSLs. 



1 Introduction 

is an industrial-strength free compiler for many source languages (C, C++, Ada, Objective C, 
Fortran, Go, ...), targetting about 30 different machine architectures, and supported on many operating 
systems. Its source code size is huge (4.296MLOcJ^]for GCC 4.6.0), heterogenous, and still increasing 
by 6% annually^] It has no single main architect and hundreds of (mostly full-time) contributors, who 
follow strict social rules 

1.1 The powerful GCC legacy 

The several GCC El front-ends (parsing C, C++, Go . . . source) produce common internal AST (abstract 
syntax tree) representations called Tree and Generic. These are later transformed into middle-end inter- 
nal representations, the Gimple statements - through a transformation called gimplification. The bulk of 
the compiler is its middle-end which operates repeatedly on these Gimple representation^] It contains 
nearly 200 passes moulding these (in different forms). Finally, back-ends (specific to the target) work 
on Register Transfer Language (RTL) representations and emit assembly code. Besides that, many other 

'Gnu Compiler Collection (gcc 4 . 6 . released on march 25 th 2011) on gcc.grm.orgl 

2 4.296 Millions Lines Of source Code, measured with David Wheeler's SLOCCount. Most other tools give bigger code 
measures, e.g., ohcount gives 8.370MLOC of source, with 5.477MLOC of code and 1.689MLOC of comments. 

3 GCC 4.4.1, released July 22 th , 2009, was 3.884MLOC, so a 0.412MLOC = 10.6% increase in 1.67 years 

4 Every submitted code patch should be accepted by a code reviewer who cannot be the author of the patch, but there is no 
project leader or head architect, like Linus Torvalds is for the Linux kernel. So GCC has not a clean, well-designed, architecture. 

5 The GCC middle-end does not depend upon the source language or the target processor (except with parameters giving 
sizeof (int) etc.). 
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data structures exist within GCC (and a lot of global variables). Most of the compiler code and opti- 
mizations work by various transformations on middle-end internal representations. GCC source code is 
mostly written in C (with a few parts in C++, or Ada), but it also has several internal C code generators. 
GCC does not use parser generators (like flex, bison, etc). 

It should be stressed that most internal GCC representations are constantly evolving, and there is 
no stabilityFlof the internal GCC APfH This makes the embedding of existing scripting languages (like 



External plugins can enhance or modify the behavior of the GCC compiler through a defined interface, 
practically provided by a set of C file headers, and made of functions, many C macros, and coding con- 
ventions. Plugins are loaded as dlopen-ed dynamic shared objects at gcc run time. They can profit from 
all the variety and power of the many internal representations and processing of GCC. Plugins enhance 
GCC by inserting new passes and/or by responding to a set of plugin events (like PLUGIN_FINISH_TYPE 
when a type has been parsed, PLUGIN_PRAGMAS to register new pragmas, . . . ). 

GCC plugins can add specific warnings (e.g., to a library), specific optimizations (e.g., transform 
fprintf (stdout ,...) — > printf (...) in user code with #include <stdio .h>), compute software metrics, 
help on source code navigation or code refactoring, etc. GCC extensions or plugins enable using and 
extending GCC for non code-generation activities like static analysis j9l|2j[T7j|2ll> threats detection (like 
in TwoflU, Coverity™^ or AstreeEHH), code refactoring, coding rules validation |[T6l . etc. They could 
provide any processing taking advantage of the many facilities already existing inside GCC. However, 
since coding GCC plugins in C is not easy, a higher-level DSL could help. Because GCC plugins are 
usually specific to a narrow user community, shortening their development time (through a higher-level 
language) makes sense. 

/* A node in a gimple_seq_d. */ 

struct GTY( (chain_next ("'/.h.next") , chain_prev ("Xh.prev"))) gimple_seq_node_d { 
gimple stmt ; 

struct gimple_seq_node_d *prev; 
struct gimple_seq_node_d *next ; } ; 



Since compilers handle many complex (perhaps circular) data structures for their internal represen- 
tations, explicitly managing memory is cumbersome during compilation. So the GCC community has 
added a crude garbage collector [11] Gg-c (GCC Garbage Collector): many C struct-ures in GCC code 
are annotated with GTY (figure [I]) to be handled by Gg-c; passes can allocate them, and a precise]^] mark 
and sweep garbage collection may be triggered by the pass manager only between passes. Gg-c does not 
know about local pointers, so garbage collected data is live and kept only if it is (indirectly) reachable 
from known global or static GTY-annotated variables (data reachable only from local variables would be 
lost). Data internal to a GCC pass is usually manually allocated and freed. GTY annotations on types and 

6 This is nearly a dogma of its community, to discourage proprietary software abuse of GCC. 

7 GCC has no well defined and documented Application Programming Interface for compiler extensions; its API is just a big 
set of header files, so is a bit messy for outsiders. 
8 See www . coverity . com 

9 Gg-c is a precise G-C knowing each pointer to handle; using Boehm's conservative garbage collector with ambigous roots 
inside GCC has been considered and rejected on performance grounds. 




(code from gcc/gimple . h in GCC) 



Figure 1 : example of GTY annotation for Gg-c 
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variables inside GCC source are processed by gengtype, a specialized generator (producing C code for 
Gg-c allocation and marking routines and roots registration). There are more than 1800 GTY-ed types 
known by Gg-c, such as: gimple (pointing to the representation of a Gimple statement), tree (pointer to 
a structure representing a Tree), basic_block (pointing the representation of a basic block of Gimple-s), 
edge (pointing to the data representing an edge between basic blocks in the control flow graph), etc. 
Sadly, not all GCC data is handled by Gg-c; a lot of data is still manually micro-managed. We call Stuff 
all the GCC internal data, either garbage-collected and GTY-annotated like gimple, tree, . . . , or outside 
the heap like raw long numbers, or even manually allocated like struct opt_pass (data describing 
GCC optimization passes). 

GCC is a big legacy system, so its API is large and quite heterogenous in style. It is not only made 
of data declarations and functions operating on them, but also contains various C macros. In particular, 
iterations inside internal representations may be provided by various styles of constructs: 

1. Iterator abstract types like (to iterate on every stmt, a gimple inside a given basic block bb) 

for (gimple_stmt_iterator gsi = gsi_start_bb (bb) ; 
!gsi_end_p (gsi); gsi_next (&gsi)) { 
gimple stmt = gsi_stmt (gsi); /* handle stmt ...*/ } 

2. Iterative f or-like macros, e.g., (to iterate for each basic block bb inside the current function cf un) 
basic_block bb; F0R_EACH_BB (bb) { /* process bb */ } 

3. More rarely, passing a callback to an iterating "higher-order" C function, e.g., (to iterate inside 
every index tree from ref and call idx.inf er_loop_bounds on that index tree) 

f or_each_index (&ref, idx_inf er_loop_bounds , &data) ; 

with a static function bool idx_inf er_loop_bovmds (tree base, tree *idx, void *dt a) called 
on every index tree base. 



1.2 Embedding an existing scripting language is impractical 

Interfacing GCC to an existing language implementation like Ocaml, Python, Guile, Lua, Ruby or some 
other scripting language is not realistic^] because of an impedance mismatch: 

1 . Most scripting languages are garbage collected, and mixing several garbage collectors is difficult 
and error-prone, in particular when both Gg-c and scripting language heaps are intermixed. 

2. The GCC API is very big, ill-defined, heterogenous, and evolving significantly. So manually coding 
the glue code between GCC and a general-purpose scripting language is a big burden, and would 
be obsoleted by a new GCC version when achieved. 

3. The GCC API is not only made of C functions, but also of macros which are not easy to call from 
a scripting language. 

4. Part of the GCC API is very low-level (e.g., field accessors), and would be invoked very often, so 
may become a performance bottleneck if used through a glue routine. 

5. GCC handles various internal data (notably using hundreds of global variables), some through 
GTY-ed Gg-c collected pointers (like gimple_seq, edge, . . . ), others with manually allocated data 
(e.g., omp .region for OpenMP parallel region information) or with numbers mapping some opaque 
information (e.g., location_t are integers encoding source file locations). GCC data has widely 
different types, usage conventions, or liveness. 

6. There is no single root type (e.g., a root class like G0bject[^]in Gtk) which would facilitate gluing 



The author spent more than a month of work trying in vain to plug Ocaml into GCC 



'See 



http : //developer . gnome . org/gobj ect/ 
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GCC into a dynamically typed language interpreter (a la Python, Guile, or Ruby). 

7. Statically typing GCC data into a strongly typed language with type inference like Ocaml or Haskell 
is impractical, since it would require the formalization of a type theory compatible with all the 
actual GCC code. 

8. Easily filtering complex nested data structures is very useful inside compilers, so most GCC exten- 
sions need to pattern-match on existing GCC stuff (notably on Gimple or Trees). 

The MELT (originally meaning "Middle End Lisp Translator") Domain Specific Language has been 
developped to increase, as any high-level DSL does, the programmer's productivity. M ELT has its specific 
generational copying garbage collector above Gg-c to address point [T] Oddity of the GCC API (points 
[2| [3] [4} is handled by generating well fit C code, and by providing mechanisms to ease that C source 
code generation. Items [4j [5J |6j [7] are tackled by mixing MELT dynamically typed values with raw GCC 
stuff. MELT has a powerful pattern matching ability to handle last point [8] because scripting languages 
don't offer extensible or embeddable pattern matching (on data structures internal to the embedding 
application). 

MELT is being used for various GCC extensions (work in progress): 

• simple warning and optimization like f printf (stdout , ...) detection and transformation (handling 
it on Gimple representation is preferable to simple textual replacement, because it cooperates with 
the compiler inlining transformation); 

• Jeremie Salvucci has coded a Gimple — > C transformer (to feed some other tool); 

• Pierre Vittet is coding various domain-specific warnings (e.g., detection of untested calls to f open); 

• the author is developing an extension to generate OpenCL code from some Gimple, to transport 
some highly parallel regular (e.g., matrix) code to GPUs; 



1.3 MELT = a DSL translated to code friendly with GCC internals 

The legacy constraints given by GCC on additional (e.g., plugins') code suggest that a DSL for extending 
it could be implemented by generating C code suitable for GCC internals, and by providing language 
constructs translatable into C code conforming to GCC coding style and conventions. Other attempts to 
embed a scripting language into GCC (Javascript [9] for coding rules in Firefox, Haskell for enhancing 
C++ template meta-programming HI, or Pythorf^]) have restricted themselves to a tiny part of the GCC 
API; Volanschi |[29l describes a modified GCC compiler with specialized matching rules. 

Therefore, the reasonable way to provide a higher-level domain specific language for GCC ex- 
tensions is to dynamically generate suitable C code adapted to GCC's style and legacy and similar in 
form to existing hand-coded C routines inside GCC. This is the driving idea of our MELT domain specific 
language and plugin implementation Il24ll25ll26l . By generating suitable C code for GCC internals, M ELT 
fits well into existing GCC technology. This is in sharp contrast with the Emacs editor or the C-- com- 
piler [23] whose architecture was designed and centered on an embedded interpreter (E-Lisp for Emacs, 

Lua ocaml for c y 

MELT is a Lisp-looking DSL designed to work on GCC internals. It handles both dynamically typed 
MELT values and raw GCC stuff (like gimple, tree, edge and many others). It supports applicative, object 
and reflective programming styles. It offers powerful pattern matching facilities to work on GCC internal 
representations, essential inside a compiler. It is translated into C code and offer linguistic devices to 
deal nicely with GCC legacy code. 



12, 



See David Malcom's GCC Python plugin announced in http: //gcc .gnu.org/ml/gcc/2011-06/msg00293.html 
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2 Using MELT and its runtime. 

2.1 M ELT usage and organization overview 

From the user's perspective, the GCC compiler enabled with MELT (GCC melt ) can be run with a com- 
mand as: gcc -f plugin=melt -f plugin-arg-melt-mode=opengpu -0 -c foo.c. This instructs gcc 
(the gcc-4.6 packaged in Debian) to run the compiler proper ccl, asks it to load the melt. so plu- 
gin which provides the MELT specific runtime infrastructure, and passes to that plugin the argument 
mode=opengpu while ccl compiles the user's foo.c. The melt. so plugin initializes the MELT runtime, 
hence itself dlopen-s MELT modules like warmelt* . so & xtramelt* . so. These modules initialize MELT 
data, e.g., classes, instances, closures, and handlers. The MELT handler associated to the opengpu mode 
registers a new GCC pass (available in xtramelt-opengpu.melt) which is executed by the GCC pass 
manager when compiling the file foo.c. This opengpu pass uses Graphite E71 to find optimization 
opportunities in loops and should generate OpenCL code to run these on GPUs, transforming the 
Gimple to call that generated OpenCL code. The melt. so plugin is mostly hand-coded in C (in our 
melt -runtime . [he] files - 15KLOC, which #include generated files). The MELT modules warmelt* . so 
& xtramelt* . so[j^]are coded in MELT (as source files warmelt* .melt, . . . , xtramelt* .melt which have 
been translated by MELT into generated C files warmelt* . c & xtramelt-* . c, themselves compiled into 
modules warmelt* . so . . . ). 

The MELT translator (able to generate *.c from *.melt) is bootstrapped so that it exercises 
most of its features and its runtime : the translator's source code is coded in MELT, precisely the 
melt /warmelt* .melt files (39KLOC), and the MELT source repository also contains the generated files 
melt/generated/warmelt* . c (769KLOC). Other MELT files, like melt/xtramelt* .melt (6KLOC) don't 
need to have their generated translation kept. The MELT translatoip^| is not a GCC front-end (since it 
produces C code for the host system, not Generic or Gimple internal representations suited for the target 
machine); and it is even able to dynamically generate, during an GCC melt compiler invocation, some 
temporary * . c code, run make to compile that into a temporary * . so, and load (i.e. dlopen) and execute 
that - all this in a single gcc user invocation; this can be useful for sophisticated static analysis |[25l 
specialized using partial evaluation techniques within the analyzer, or just to "run" a MELT file. 

The MELT translator works in several steps: the reader builds s-expressions in MELT heap. Macro- 
expansion translates them into a MELT AST. Normalization introduces necessary temporaries and builds 
a normal form. Generation makes a representation very close to C code. At last that representation is 
emitted to output generated C code. There is no optimization done by the MELT translator (except for 



compilation of pattern matching, see ^ 4.4 1. 

Translation from MELT code to C code is fast: on a x86-64 GNU/Linux desktop system^ the 
6.5KLOC warmelt-normal .melt file is translated into five warmelt-normal* . c files with a total of 
239KLOC in just one second (wall time). But 32 seconds are needed to build the warmelt-normal . so 
module (with maka^] running gcc -01 -fPIC) from these generated C files. So most of the time is 
spent in compiling the generated C code, not in generating it. In contrast to several DSLs persisting their 



13 In April 201 1, the opengpu pass, coded in MELT, is still incomplete in MELT 0.7 svn rev. 173182. 

14 The module names warmelt*. so & xtramelt*. so are somehow indirectly hard-coded in melt-runtime . c but could be 
overloaded by many explicit -fplugin-arg-melt-* options. 

15 The translation from file ana-simple .melt to ana-simple, c is done by invoking gcc -fplugin=melt 
-f plugin-arg-melt-mode=translatef ile -fplugin-arg-melt-arg=ana-simple .melt ...on an empty C file empty. c, 
only to have ccl launched by gcc! 

16 An Intel Q9550 @ 2.83GHz, 8Gb RAM, fast 10KRPM Sata 150Gb disk, Debian/Sid/AMD64. 

17 So it helps to run that in parallel using make - j ; the 32 seconds timing is a sequential single-job make. 
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melt_ptr_t meltgc_new_int (meltobject_ptr_t discr_p, long num) { 

MELT_ENTERFRAME (2, NULL); 

#define newintv meltfram .mcf r_varptr [0] 

#define discrv meltfram .mcf r_varptr [1] 

discrv = (void *) discr_p; 

if (melt_magic_discr ( (melt_ptr_t) (discrv)) != MELT0BMAG_0BJECT) 
goto end; 

if (((meltobject_ptr_t)discrv)->obj_num != MELT0BMAG_INT) 
goto end; 

newintv = meltgc_allocate (sizeof (struct meltint_st), 0); 
((struct meltint_st*)newintv) ->discr = (meltobject_ptr_t) discrv; 
((struct meltint_st*)newintv) ->val = num; 
end : 

MELT_EXITFRAME () ; 

return (melt_ptr_t) newintv; 

} 



Figure 2: MELT runtime function boxing an integer 

closure^Jby serializing a mixture of data and code, MELT starts with an empty heap, so MELT modules' 
initialization routines are mostly long and sequential C code initializing the MELT heap. 

2.2 MELT runtime infrastructure 

The MELT runtime melt-runtime. c is built above the GCC infrastructure, notably Gg-c. However, Gg- 
c is not a sufficient garbage collector for MELT values, like closures, lists, tuples, objects, ... As in 
most applicative or functional languages, MELT code tends to allocate a lot of temporary values (which 
often die quickly). So garbage collection (G-C) of MELT values may happen often, and does need to 
happen even inside GCC passes written in MELT, not only between passes. These values are handled 
by our generational copying MELT G-C, triggered by the MELT allocator when its birth region is full, 
and backed up by the existing Gg-c (so the old generation of MELT G-C is the Gg-c heap). Generational 
copying GCs ifTTTl handle quickly dead young temporary values by discarding them at once after having 
copied each live young value out of the birth region, but require a scan of all local variables, need to 
forward pointers to moved values, a write barrier, and normalization (like the administrative normal 
form in Q) of explicit intermediate values inside call^] This is awkward in hand-written C code but 
easy to generate. Minor MELT G-Cs are triggered before each call to gcc.collect (i.e. to the full Gg-c) 
to ensure that all live young MELT values have migrated to the old Gg-c heap. Compatibility between our 
MELT GC and Gg-c is thus achieved. An array of more than a hundred predefined values contains the 
only "global" MELT values (which are global roots for both the MELT GC and Gg-c). 

MELT call frames are aggregated as local struct-ures, containing local MELT values, the currently 
called MELT closure, and local stuff (like raw tree pointers, etc.). Values inside these call frames are 
known to the MELT garbage collector, which scans them and possibly moves them. Expliciting these 
call frames facilitates introspective runtime reflection |[T8l[T9ll20l at the MELT level; this might be useful 
for some future sophisticated analysis, e.g., in abstract interpretation |2j 13 of recursive functions, as 
a widening strategy. Concretely, local MELT values (and stuff) are aggregated in MELT call frames 
(represented as generated C local struct-ures) organized in a single-linked list. This also enables the 
display of the MELT backtrace stack on errors. 

18 Ocaml bytecode contains both code and data; GNU { Emacs, CLisp, Smalltalk } persist their entire heap image. But MELT 
has no persistent data files, to avoid serializing GCC's stuff (ie GCC's native data). 

19 That is, f(g(x),y) should be normalized as X = g(x);f(x,y) with T being a fresh temporary. 
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struct { 

int mcfr_nbvar; /* number of MELT local values*/ 

const char *mcf r_f Iocs ; /* location string for debugging*/ 

struct meltclosure_st *mcfr_clos; /* current closure*/ 

struct melt_callf rame_st *mcfr_prev; /* link to previous MELT frame */ 

void *mcf r_varptr [2] ; /* local MELT values */ 

} meltfram ; /* MELT current call frame */ 

static char locbuf _1591 [84] ; /* location string */ 

if (! locbuf _1591[0]) 

snprintf (locbuf _1591 , sizeof (locbuf _1591) - 1, "%s:%d", basename ( "gcc/melt-runtime . c" ) , (int) 1591); 

memset (&meltfram , 0, sizeof (meltfram )); 

meltfram .mcfr_nbvar = (2); 

meltfram .mcfr_flocs = locbuf _1591; 

meltfram .mcfr_prev = (struct melt_callf rame_st *) melt_topf rame ; 

meltfram .mcfr_clos = (((void *) 0)); 

melt_topf rame = ((struct melt_callf rame_st *) taneltfram ); 



Figure 3: C preprocessor expansion of MELT_ENTERFRAME (2 , NULL) at line 1591 



The figure [2] gives an example of hand- written code following MELT conventions (a function 
meltgc_new_int boxing an integer into a value of given dicriminant and number to be boxed). It uses 
the MELT_ENTERFRAME macrd^J which is expanded by the C preprocessor into the code in figure [5] which 

declares and initialize the MELT call frame meltfram The MELT_EXITFRAME () macro occurrence 

is expanded into melt_topf rame = (struct melt_callf rame_st *) (meltf ram__.mcf r_prev) ; to pop 
the current MELT frame. MELT provides a GCC pass checking some of MELT coding conventions in the 
hand-written part of the MELT runtime. 

The MELT runtime depends deeply upon Gg-c, but does not depend much on the details of GCC's 
main data structures like e.g., tree or gimple or loop : our melt-runtime . c can usually be recompiled 
without changes when GCC's file gimple .h or tree .h changes, or when passes are changed or added in 
GCC's core. The MELT translator files warmelt* .melt (and the generated warmelt* . c files) don't depend 
really on GCC data structures like gimple. As a case in point, the major "gimple to tuple" transition p*j 
in gcc-4 . 4, which impacted a lot of GCC files, was smoothly handled within the MELT translator. 

The MELT files which are actually processing GCC internal representations (like our 
xtramelt-* .melt or user MELT code), that is MELT code implementing new GCC passes, have to change 
only when the GCC API changes - exactly like other GCC passes. Often, since the change is compati- 
ble with existing code, these MELT files don't have to be changed at all (but should be recompiled into 
modules). 

MELT handles two kinds of things: the first-class MELT values (allocated and managed in MELT's 
GC-ed heap) and other stuff, which are any other GCC data managed in C (either generated or hand- 
written C code within GCC melt ). Informally, Things = Values U Stuff. So raw long-s, edge-s or tree-s are 
stuff, and appear exactly in MELT memory like C-coded GCC passes handle them (without extra boxing). 
Variables and [subexpressions in MELT code, hence locals in MELT call frames, can be things of either 
kind (values or stuff). 

Since Gg-c requires each pointer to be of a gengtype- known type, values are really different from 



The Ocaml runtime has similar macros. 
2I In the old days of GCC version 4.3 the Gimple representation was physically implemented in tree-s and the C data structure 
gimple did not exist yet; at that time, Gimple was sharing the same physical structures as Trees and Generic [so Gimple was 
mostly a conventional restriction on Trees] - that is using many linked lists. The 4.4 release added the gimple structure to 
represent them, using arrays, not lists, for sibling nodes; this improved significantly GCC's performance but required patching 
many files. 
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stuff. There is unfortunately no way to implement full polymorphism in MELT: we cannot have MELT 
tuples containing a mix of raw tree-s and MELT objects (even if both are Gg-c managed pointers). This 
Gg-c limitation has deep consequences in the MELT language {stuff, i.e. GCC native data, sadly cannot be 
first-class MELT values!). 

Some parts of the MELT runtime are generated (by a special MELT mode). Various MELT values' 
and stuff implementation are described by MELT instances. So adding extra types of values, or interfac- 
ing additional GCC stuff to MELT, is fairly simple, but requires a complete re-building of MELT. Their 
GTY( (...) ) struct-ure declarations in C are generated. Lower parts of the MELT runtime (allocating, 
forwarding, scanning routines - see chapters 6 & 7 of ifTTl - for the copying MELT G-C, hash-tables 
implementation, . . . ) are also generated. This generated C code is kept in the source repository. 

Notice that the distinction between first-class MELT values and plain stuff is essential in MELT, and 
is required by current GCC practices (notably its Gg-c collector). Therefore, the MELT language itself 
needs to denote them separately and explicitly, and the MELT runtime (and generated code) handles them 
differently. In that respect, MELT is not like Lisp, Scheme, Guile, Lua and Python. However, MELT 
coders should usually prefer handling values (the "first class citizens"), not raw stuff. 



2.3 MELT debugging aids 

When generating non-trivial C code, it is important to lower the risk of crashing the generated codepj 
This is achieved by systematically clearing all data (both values and raw stuff) to avoid uninitialized 
pointers (and MELT G-C also requires that), and by carefully coding low-level operations (primitives 
{ 3.4.2 c-matchers {4.3 code chunks §3.4.1 1 with tests against null pointers. 



The generated C code produced by the MELT translator contains many #line directives (suitably 
wrapped with #if def ). In the rare cases when the gdb debugger needs to be used on MELT code (e.g., 
to deal with crashes or infinite loops), it will refer correctly to the originating MELT source file location. 
These positions are also written into MELT call frames, to ease backtracing on error. 

MELT uses debug printing and assertions quite extensively. If enabled by the 
-fplugin-arg-melt-debug program argument to gcc, a lot of debug printing happens : each use 
of the debugjnsg operation displays the current MELT source location, a message, and a value p^j For 
debugging stuff data, primitives debugtree, debuggimple, etc. are available. Assertions are provided by 
assert_msg which takes a message and a condition to check. When the check fails, the entire MELT call 
stack is printed (with positions referring to * .melt source files). 

When variadic functions will be available in MELT, their first use will support polymorphic debug 
printing. A debug "macro" would be expanded into calls to a debug_at variading function, which would 
get the source location value as its first argument, and the values or stuff to be debug-printed as secondary 
variadic arguments. 

An older version of MELT could be used with an external probe, which was a graphical program 
interacting with ccl through asynchronous textual protocols. This approach required a quite invasive 
patch of GCC's code itself. The current GCC pass manager and plugin machinery now provides enough 
hooks, and future versions of MELT might communicate asynchronously with a central monitor (to be 
developed). 



22 However, it is still possible to make some MELT code crash, for instance by adding bugs in the C form of our code chunks 
33.4. 1 In practice, MELT code crashes very rarely; most often it fails by breaking some assertions. 
^Values are printed for debug use with MELT message passing through the dbgjdutput & dbg_outputagain selectors. 
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3 The MELT language and its pecularities 

Some familiarity with a Lisp-like language (like Emacs Lisp, Scheme, Common Lisp, etc.) is welcome 
to understand this section. Acquaintance with a dynamically typed scripting language like Python, Guile 
or Ruby could also help. See the web site gcc-melt . org for more material (notably tutorials) on MELT. 

MELT has a Lisp-like syntax because it was (at its very beginning) implemented with an initial "ex- 
ternal" MELT to C translator prototyped in Common Lisp. Since then, a lot of newer features have been 
progressively added (using an older version of MELT to bootstrap its current version). The Emacs Lisp 
language (in the Emacs editor), Guile (the Gnu implementation of Scheme), and machine description files 
in GCC back-end are successful examples of other Lisp dialects within Gnu software. Finally, existing 
editing mode^jfor Lisp are sufficient for MELT. 

An alternative infix syntax (code-named Milt) for MELT is in the works; the idea is to have an infix 
parser, coded in MELT, for future *.milt files, which is parsed into MELT internal s-expressions (i.e. 
into the same instances of class_sexpr as the MELT Lisp-like reader does): symbols starting with + or 
- are parsed as infix operators (like Ocaml does) with additive precedences, those starting with * or / 
have multiplicative precedence, etc. 

MELT shares with existing Lisp languages many syntactic and lexical conventions for comments, 
indentation, symbols (which may be non alpha-numerical), case-insensitivity, and a lot of syntax (like 
if, let, letrec, def un, cond . . . ). As in all Lisp dialects, everything is parenthesized like ( operator 
operands ... ) so parenthesis are highly significant. The quote, back-quote, comma and question mark 
characters have special significance, so 'a is parsed exactly as (quote a), ?b as (question b) etc. 
Like in Common Lisp, words prefixed with a colon like : long are considered as "keywords" and are not 
subject to evaluation. Symbols and keywords exist both in source files and in the running MELT heap. 



3.1 M E LT macro-strings 



Since "mixing" C code chunks ( §3.4.1 1 inside MELT code is very important, simple meta-programming 



is implemented by a lexical trick| 25 | macro-strings are strings prefixed with #{ and suffixed with }# and 
are parsed specially; these prefix and suffix strings have been chosen because they usually don't appear in 
C code. Within a macro-string, backslash does not escape characters, but $ and sometimes # are scanned 
specially, to parse symbols inside macro-strings. 

For example, MELT reads the macro-string #{/*$P#A*/printf ("a=°/.ld\n" , $A) ;}# ex- 
actly as a list ("/*" p "A*/printf (\"a=°/„ld\\n\" , " a ");") of 5 elements whose 1 st , 
3 rd and 5 th elements are string^] and 2 nd and 4 th elements are symbols p and a. This is useful when one 
wants to mix C code inside MELT code; some macro-strings are several dozens of lines long, but don't 
need any extra escapes (as would be required by using plain strings). 

Another example of macro-string is given in the following "hello-world" (complete) MELT program: 

;; file helloworld. melt 
(code_chunk helloworldchunk 

#{int i=0; /* our $HELLOWQRLDCHUNK */ 

$HELLOWORLDCHUNK#_label : printf ("hello world from MELT\n") ; 

if (i++ < 3) goto $HELLOWORLDCHUNK#_label; }#) 



24 Emacs mode for Lisp is nearly enough for editing, highlighting and indenting MELT code. 
25 Inspired by handling of $ in strings or "here-documents" by shells, Perl, Ruby, ... 
26 The first string has the two characters / * and the last has the two characters ) ; 
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The macro-string spans on 3 lines, and contains some C code with the helloworldchunk MELT 
symbol. The above helloworld.melt file (of 4 lines) is tr anslate d into a helloworld. c file (of 389 
linea^lin C). It uses the code .chunk construct explained in £3.4.1 below (to emit translated C code). 



3.2 M E LT values and stuff 

Every MELT value has a discriminant (at the start of the memory zone containing that value). As 
an exception, nil |^J represented by the C null pointer has conventionally a specific discriminant 
d i scr_null_rec i ever. The discriminant of a value is used by the MELT runtime, by Gg-c and in MELT 
code to separate them. MELT values can be boxed stuff (e.g., boxed long or boxed tree), closures, 
lists, pairs, tuples, boxed strings, . . . , and MELT objects. Several predefined objects, e.g., class_class, 
discr_null_recei ver . . . , are required by the MELT runtime. The hierarchy of discriminants is rooted 
at d i scr_any_rece i ver W\ Discriminants are objects (of class_discriminant). Core classes and 
discriminants are predefined as MELT values (known by both Gg-c and MELT G-C). 

Each MELT object has its class as its discriminant. Classes are themselves objects and are 
organized in a single-inheritance hierarchy rooted at class_rqqt (whose parent discriminant is 
discr_any_reciever). Objects are represented in C as exactly a structure with its class (i.e. discrim- 
inant) obj_class, its unsigned hash-code obj Jiash (initialized once and for all), an unsigned "magic" 
short number obj jnum, the unsigned short number of fields obj_len, and the obj_vartab [obj_len] array 
of fields, which are MELT values. The obj jnum in objects can be set at most once to a non-zero unsigned 
short, and may be used as a tag: MELT and Gg-c discriminate quickly a value's data-type (for marking, 
scanning and other purposes) through the obj jnum of their discriminant. So, safely testing in C if a value 
p is a MELT closure is as fast as p != NULL kk p->discr->obj_num == MELTOBMAG_CLOSURE. 

MELT field descriptors and method selectors are objects. Every MELT value (object or not, even 
nil) can be sent a message, since its discriminant (i.e., its class, if it is an object) has a method map (a 
hash table associating selectors to method bodies) and a parent discriminant (or super-class). Message 
passing in MELT is similar to those in Smalltalk and Ruby. Method bodies can be dynamically installed 
with (installjmethod discriminant selector function) and removed at any time in any discriminant or 
class. Method invocations use the method hash-maps (similar to methods' dictionnaries in Smalltalk) to 
find the actual method to run. 

The MELT reader produces mostly objects and sometimes other values: S-expressions are parsed as 
instances of class_sexpr (containing the expression's source location and the list of its components); 
symbols (like == or let or x) as instances of class_symbol; keywords like : long or : else as instances 
of class_keywqrd; numbers like -1 as values of discr.integer etc. 

Each stuff (that is, non-value things like long or tree . . . ) have its boxed value counterpart, so boxed 
gimple-s are values containing, in addition of their discriminant (like discr.gimple), a raw gimple 
pointer. 

In MELT expressions, literal integers like 23 or strings like "hello\n" refer to raw : long or : cstring 
■^w/fH not constant values. To be considered as MELT values they need to be quoted, so (contrarily to 
other Lisps) in MELT 2 ^ ' 2 : the plain 2 denotes a raw stuff of c-type : long so is not a value, but the 

27 Wifh 260 lines of code, including 111 preprocessor directives, mostly #line, and 129 comment or blank lines, and all the 
code doing "initialization". 

28 As in Common Lisp or Emacs Lisp (or C itself), but not as in Scheme, MELT nil value is considered as false, and every 
non-nil value is true. 

29 discr_anyjreceiver is rarely used, e.g., to install catch-all method handlers. 

30 All : cstring are (const char*) C-strings in the text segment of the executable, so they are not malloc-ed. 
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quoted expression '2 denotes the boxed integer 2 constant value of discr_constant_integer so they 
are not equivalent! As in Lisp, a quoted symbol like ' j denotes a constant value (of class_symbol). 

To associate things (either MELT objects or GCC stuff, all of the same type) to MELT values, hash- 
maps are extensively used: so homogenous hash tables keyed by objects, raw strings, or raw stuff like 
tree-s or gimple-s . . . are values (of discriminant discr_map_objects . . . , discr_map_trees). While 
hash-maps are more costly than direct fields in structures to associate some data to these structures, they 
have the important benefit of avoiding disturbing existing data structures of GCC. And even C plugins of 
GCC cannot add for their own convenience extra fields into the carefully tuned tree or gimple structures 
of GCC's tree . h or gimple . h. 

Aggregate MELT values include not only objects, hash-tables and pairs, but also tuples (a value 
containing a fixed number of immutable component values), closures, lists, . . . Lists know their first and 
last pairs. Aggregate values of the same kind may have various discriminants. For instance, within a 
MELT class (which is itself a MELT object of class.class) a field gives the tuple of all super-classes 
starting with class_root. That tuple has discr_class_sequence as discriminant, while most other 
tuples have discr_multiple as discriminant. 

Decaying values may help algorithms using memoization; they contain a value reference and a 
counter, decremented at each major garbage collection. When the counter reaches 0, the reference is 
cleared to nil. 

Adding a new important GCC C type like gimple [^j] for some new stuff is fairly simple: add (in 
MELT code) a new predefined C-type descriptor (like ctype.gimple referring to keyword : gimple) and 
additional discriminants, and regenerate all of MELT. C-type descriptors (e.g., ctype_edge) and value 
type descriptors (like valdesc.list) contains dozen[s] of fields (names or body chunk of generated C 
routines) used when generating the runtime support routines. 

The :void keyword (and so ctype_void) is used for side-effecting code without results. C-type 
keywords (like :void, :long, :tree, : value, : gimple, :gimple_seq, etc.) qualify (in MELT source 
code) formal arguments, local variables (bound by let, . . . ), etc. 

MELT is typed for things: e.g., the translator complains if the +i primitive addition operator (expect- 
ing two raw :long stuff and giving a :long result) is given a value or a :tree argument. Furthermore, 
let bindings can be explicitly typed (by default they bind a value). Within values, typing is dynamic; 
for instance, a value is checked at runtime to be a closure before being applied. When applying a MELT 
closure to arguments, the first argument, if any, needs to be a value (it would be the receiver if the closure 
is a method for message passingQ others can be things, i.e. values or stuff. In MELT applications, 
the types of secondary arguments and secondary results are described by constant byte strings, and the 
secondary arguments or results are passed (in generated C code) as an array of unions. The generated 
MELT function prologue (in C) checks that the formal and actual type of secondary arguments are the 
same (otherwise, argument passing stops, and all following actual arguments are cleared). 

All MELT things (value or stuff), in particular local variables (or mismatched formals), are initially 
cleared (usually by zeroing the whole MELT call frame in the C prologue of each generated routine). So 
MELT values are initially () (i.e., nil in MELT syntax), a : tree stuff is initially the null tree (i.e. (tree) 
in C syntax), a :long stuff is initially 0L, a :cstring stuff is initialized to (const char*)0. Notice 
that cleared stuff is considered as false in conditional context. 

3I This kind of radical addition don't happen often in the GCC community because it usually impacts a lot of GCC files. 

32 The somehow arbitrary requirement of having the first argument of every MELT function be a value speeds up calls to 
functions with one single value argument, and permits using closures as methods without checks: sending a message to a raw 
stuff like e.g., a tree won't work. 
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Functions written in MELT (with defun for named functions or lambda for anonymous ones) always 
return a value as their primary result (which may be ignored by the caller, and defaults to nil). The first 
formal argument (if any) and the primary result of MELT functions should be values (so nested function 
calls deal mainly with values). Secondary arguments and results can be any things (each one is either 
a value or some stuff). The (multicall ...) syntax binds primary and secondary results like Common 
Lisp's multiple-value-bind. 

3.3 Syntax overview 

The following constructs should be familiar (except the last one, match, for pattern matching) since 
they look like in other Lisps. Notice that our let is always sequential] Formals in abstractions [^] 
are restricted to start with a formal value; this speeds up the common case of functions with a single 
value argument, and facilitates installation of any function as method (without checking that the formal 
reciever is indeed a value). 

List of formal arguments (in lambda, defun etc.) contains either symbols (which are names of formals 

bound by e.g., the lambda) like x or discr, or c-type keywords like rvalue or :long or :gimple A 

c-type keyword qualify all successing formals up to the next c-type keywords, and the default c-type is 
: value. For example, the formal arguments list (x y :long n k :gimple g : value v) have 6 formals : 
x y v are MELT values, n k are raw long stuff, g is a raw gimple stuff. 

Local bindings (in let or letrec) has an optional c-type annotation, then the newly bound symbol, 
then the sub-expression bounding it. So ( :long x 2) locally binds (in the body of the enclosing let) 
the symbol x to the raw long stuff 2, and in the let body x is a raw long variable. 

Patterns and pattern matching are explained in 



expressions where n > and p > 



application 


(0 ai ... a„) 


apply function (or primitive) to arguments a,- 


assignment 


(setq v e) 


set local variable v to £ 


message passing 


(a p ai ... a„) 


send selector a to reciever p with arguments a, 


let expression 


(let (j3i.../3„) £i...£ p £') 


with local sequential bindings j3, evaluate side- 
effecting sub-expressions £,• and give result of e' 


sequence 


(progn E\...E„ £ ; ) 


evaluate £; (for their side effects) and at last e', giving its 
result (like the operator , in C) 


abstraction 


(lambda £i...£„ £') 


anonymous function with formals <p and side- 
effecting expressions £,, return result of £' 


pattern matching 


(match £ Xl ■■■ Xn) 


match result of £ against match clauses %U giving 
result of last expression of matched clause. 



Conditional expressions alter control flow as usual. However, conditions can be things, e.g., the 
: long stuff is false, other long stuff are true, a gimple stuff is false iff it is the null gimple pointer, etc. 
The "else" part e of an if test is optional. When missing, it is false, that is a cleared thing. Notice that 
tested conditions and the result of a conditional expression can be either values or raw stuff, but all the 
conditional sub-expressions of a condition should have consistent types, otherwise the entire expression 
has : void type. 



So the let of MELT is like the let* of Scheme! 

Notice that lambda abstractions are constructive expressions and may appear in letrec or let bindings. 
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conditional expressions where n > and p > 



test 


(if T 


£) 


if T then else £ (like ? : in C) 


conditional 


(cond K"i 


... K„) 


evaluate conditions Kj until one is satisfied 


conjunction 


(and K\ 


.. fC„ fc') 


if Kj and then . . . and then fc„ is "true" (non nil or non zero) then 
K 1 otherwise the cleared thing of same type 


disjunction 


(or 5i .. 


Sn) 


8\ or else 52 ... = the first of the <5, which is "true" (non nil, or non 
zero, ...) 



In a cond expression, every condition Kj (except perhaps the last) is like (^ e, ; i ... E^ p . e') with 
Pi > 0. The first such condition for which % is "true" gets its sub-expressions e,- ,• evaluated sequentially 
for their side-effects and gives the result of e'. The last condition can be (:else E\ ... £,, e'), is 
triggered if all previous conditions failed, and (with the sub-expressions £, evaluated sequentially for 
their side-effects) gives the result of e' 

MELT has some more expressions. 



more expressions 



loop 


(forever A OL\ ... OC„) 


loop indefinitely on the a, which may 
exit 


exit 


(exit A £\ ... £„ £') 


exit enclosing loop A after side-effects 
of £,- and result of e' 


return 


(return £ E\ ... £„) 


return £ as the main result, and the £, as 
secondary results 


multiple call 


(multicall <j) K E\...£ n E 1 ) 


locally bind formals to main and sec- 
ondary result[s] of application or send K 
and evaluate the £, for side-effects and 
e' for result 


recursive let 


(letrec (]8i...j8„) £\...£ p ) 


with [mutually-] recursive constructive 
bindings j3; evaluate sub-expressions £,- 


field access 


(get_field : 4> £) 


if £ gives an appropriate object retrieves 
its field <t>, otherwise nil 


unsafe field access 


(unsafe_get_f ield :4> £) 


unsafe access without check like the 
above operation 


object update 


(put_fields £ :4>i £\ ... :4>„ £„) 


safely update (if appropriate) in the ob- 
ject given by £ each field <t>, with £, 


unsafe object update 


(unsafe_put .fields £ :4>i £i ...) 


unsafely update the object given by £ 



The unsafe field access unsaf e_get_f ield is reserved to expert MELT programmers, since it may 
crash. The safer variant test that the expression £ evaluate^] to a MELT object of appropriate class 
before accessing a field <I> in it. Field updates with put .fields are safe[^] with an unsafe but quicker 
variant unsaf e_put_f ields available for MELT experts. 

Mutually recursive letrec bindings should have only constructive expressions. 



constructive expressions 



list 


(list OL\ ... 


a„) 




make a list of n values a, 


tuple 


(tuple <X\ ... 






make a tuple of n values a, 


instance 


(instance K 


:<&1 £i ... 


:<&„ £„) 


make an instance of class K and n 
fields 4>, set to value £, 



"i.e. test if the value CO of e is an object which is a direct or indirect instance of the class defining field S>, otherwise a nil 
value is given. 

3f) Update object CO, value of e, only if it is an object which is a direct or indirect instance of the class defining each field <&; 
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Of course lambda expressions are also constructive and can appear inside letrec. Notice that since 
MELT is translated into C, and because of runtime constraints, MELT recursion is never handled tail- 
recursively so always consume stack space. This also motivates iterative constructions (like forever and 
our iterators). 

Name defining expressions have a syntax starting with def . Most of them (except def un, def class, 
def instance) have no equivalent in other languages, because they define bindings related to C code 
generation. For the MELT translator, bindings have various kinds; each binding kind is implemented as 
some subclass of class_any_binding. 

Name exporting expressions are essentially directives for the module system of MELT. Only exported 
names are visible outside a module. A module initialization expects a parent environment and produces 
a newer environment containing exported bindings. Both name defining and exporting expressions are 
supposed to appear only at the top-level (and should not be nested inside other MELT expressions). 



expressions defining names 



for functions 


(defun V <p E\ ... £„ e') 


define function V with formal arguments ifi and body £i 
... £„ e' 


for classes 


(def class V : super o~ : fields 
(0i...0„) ) 


define class v of super-class a and own fields 0; 


for instances 


(def instance I K : f\ £\ ... : /„ £„) 


define an instance l of class K with each field fi 
initialized to the value of £, 


for selectors 


(def selector G K [ :formals 'F ] :/] 
£l - :/« £«) 


define an selector l of class K (usually 
class_selector) with each extra field fi 
initialized to the value of £, (usually no extra fields 
are given so n = 0) and with optional formals *P 


for primitives 


(def primitive V <j> :6 Tj) 


define primitive V with formal arguments 0, result c-type 
6 by macro-string expansion TJ 


for c-iterators 


(defciterator V $ (7 f ?) I)') 


define c-iterator V with input formals O, state symbol 
a, local formals start expansion TJ, end expansion rj' 


for c-matchers 


(defcmatcher V <E> 0" T) r\') 


define c-matcher V with input formals [the matched 
thing, then other inputs], output formals H/, state symbol 
0", test expansion r\, fill expansion r\' 


for fun-matchers 


(defunmatcher V $ ¥ £) 


define funmatcher V with input formals O, output for- 
mals with function e 


of values 


expressions exportin 

(export _value V] ...) 


g names 

export the names V; as bindings of values (e.g., of 
functions, objects, matcher, selector, ...) 


of macros 


(exportjnacro V £) 


export name V as a binding of a macro (expanded by 
the e function) 


of classes 


(export_class V; ...) 


export every class name v,- and all their own fields 
(as value bindings) 


as synonym 


(export .synonym V v') 


export the new name v as a synonym of the existing 
name v' 



Macro-expansion is internally the first step of MELT translation to C: parsed (or in-heap) S-exprs (of 
class_sexpr) are macro-expanded into a MELT "abstract syntax tree" (a subclass of class_source). 
This macro machinery is extensively used, e.g., let and if constructs are macro-expanded (to instances 
of class_squrce_let or class _squrce_i f respectively. 

Field names and class names are supposed to be globally unique, to enable checking their access or 
update. Conventionally class names start with class, and field names usually share a common unique 
prefix in their class. There is no protection (i.e. visibility restriction like private in C++) for accessing 
a field. 
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All definitions accept documentation annotation using :doc, and a documentation generator mode 
produces documentation with-cross references in Texinfo format. 

Miscellanous constructs are available, to help in debugging or coding or to generate various C code 
depending on compile-time conditions. 

expressions for debugging 



debug message 


(debug_msg £ /l) 


debug printing message fl & value £ 


assert check 


(assertjnsg fl T) 


nice "halt" showing message /i when asserted test T is false 


warning 


(compile_warning [A £) 


like #warning in C: emit warning fl at MELT translation time and gives 

£ 


Cpp test 


(cppif a £ £') 


meta-conditionals 

conditional on a preprocessor symbol: emitted C code is #if a code for e 


#else code for e' #endif 


Version test 


(gccif j8 £1 ...) 


the £,- are translated only if GCC has version prefix string /3 



Reflective access to the current and parent environment is possible (but useful in exceptional cases, 
since export-., directives are available to extend the current exported environment): 

introspective expressions 



Parent environment 


(parent _module_environment ) 


gives the previous module environment 


Current environment 


(current_module_environment .container) 


gives the container of the current module's 
environment 



3.4 Linguistic constructs to fit MELT into GCC 

Several language constructs are available to help fit MELT into GCC, taking advantage of MELT and GCC 
runtime infrastructure (notably Gg-c). They usually use macro-strings to provide C code with holes. 
Code chunks (£3.4.1 1 simply permit to insert C code in MELT code. Higher-level constructs describe 
how to translate other MELT expressions into C: primitives 3.4.2 1 describe how to translate low-level 
operations into C; c-iterators ({ 3.4.3 1 define how iterative expressions are translated into f or-like loops; 
c-matchers ({ 4.3 ) define how to generate simple patterns (for matching), etc. 



3.4.1 Code chunks 

Code chunks are simple MELT templates (of :void c-type) for generated C code. They are the lowest 
possible way of impacting MELT C code generation, so are seldom used in MELT (like asm is rarely used 
in C). 

As a trivial example where i is a M ELT : long variable bound in an enclosing let, 

(code_churik sta 

#{$sta#_lab: printf ("i=°/„ld\n" , $i++) ; goto $sta#_lab; }# ) 

would be translated to 

{sta_l_lab: printf ("i=*/„ld\n" , curfnum [3] ++) ; goto sta_l_lab;} 

the first time it translated (i becoming curfnum [3] in C), but would use sta_2_lab the second time, etc. 
The first argument of code_chunk - sta here - is a state symbol, expanded to a C identifier unique to the 
code chunk's translation. The second argument is the macro-string serving as template to the generated 
C code. The state symbol is uniquely expanded, and other symbols should be MELT varia bles and are 
replaced by their translation. So the code_chunk of state symbol helloworldchunk in j ]3.1| is translated 
into the following C code: 

int i=0; /* our HELLOWORLDCHUNK 1 */ 

HELLOWORLDCHUNK l_label : printf ("hello world from MELT\n") ; 

if (i++ < 3) goto HELLOWORLDCHUNK l_label ; ; 
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3.4.2 Primitives 

Primitives define a MELT operator by its C expansion. The unary negation negi is denned exactly as : 

(def primitive negi (:long i) :long 
:doc #{lnteger unary negation of 
#{(-($!))}# ) 

Here we specify that the formal argument i is, like the result of negi, a : long stuff. We give an 
optional documentation, followed by the macro-string for the C expansion. Primitives don't have state 
variables but are subject to normalization^ and type checking. During expansion, the formals appearing 
in the primitive definition are replaced appropriately. 



3.4.3 C-iterators 

A MELT c-iterator is an operator translated into a for-like C loop. The GCC compiler defines many 
constructs similar to C for loops, usually with a mixture of macros and/or trivial inlined functions. 
C-iterators are needed in MELT because the GCC API defines many iterative conventions. For exam- 



ple, to iterate on every gimple g inside a given gimple_seq s GCC mandates (see { 1.1) the use of a 
gimple_simple_iterator. 

In MELT, to iterate on the :gimpleseq s obtained by the expression a and do something on ev- 
ery :gimple g inside s, we can simply code (let ( (:gimpleseq s a) ) (each_in_gimpleseq (s) 
(: gimple g) [do something with g...J)) by invoking the c-iterator each_in_gimpleseq, with a list of 
inputs - here simply (s) - and a list of local formals - here (: gimple g) - as the iterated things. 

This c-iterator (a template for such for-like loops) is defined exactly as: 

(def citerator each_in_gimpleseq 

(:gimpleseq gseq) ;start formals 

eachgimplseq ;state 
(: gimple g) ; local formals 

#{/* start $ eachgimplseq: */ 
gimple_stmt_iterator gsi_$eachgimplseq; 
if ($gseq) for (gsi_$eachgimplseq = gsi_start ($gseq) ; 

!gsi_end_p (gsi_$eachgimplseq) ; 
gsi_next (&gsi_$eachgimplseq) ) { 
$g = gsi_stmt (gsi_$eachgimplseq) ; }# 
#{ } /* end $ 'eachgimplseq*/ }#) 

We give the start formals, state symbol, local formals and the "before" and "after" expansion of the 
generated loop block. The expansion of the body of the invocation goes between the before and after ex- 
pansions. C-iterator occurrences are also normalized (like primitive occurrences are). MELT expressions 
using c-iterators give a : void result, since they are used only for their side effects. 



3.5 Modules, environments, standard library and hooks 

A single *.melt source filq^jis translated into a single module loaded by the MELT run-time. The 
module's generated start_module_melt routine [often quite big] takes a parent environment, executes 
the top-level forms, and finally returns the newly created module's environment. Environments and their 
bindings are reified as objects. 

37 Assuming that x is a MELT variable for a :long stuff, then the expression (+i (negi x) 1) is normalized as let a = 
—x,P = a + 1 in p in pseudo-code - suitably represented inside MELT (where a,j8 are fresh gensym-ed variables). 

38 MELT can also translate into C a sequence of S-expressions from memory, and then dynamically load the corresponding 
temporary module after it has been C-compiled. 
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Only exported names add bindings in the module's environment. MELT code can explicitly ex- 
port defined values (like instances, selectors, functions, c-matchers, . . . ) using the (export_values ...) 
construct; macros (or pat-macros [that is pattern-macros producing abstract syntax of patterns]) defini- 
tions are exported using the ( export _macro ...) construct or (export_patmacro ...); classes and their 
own fields are exported using the (export_class ...) construct. Macros and pattern macros in MELT 
are expanded into an abstract syntax tree (made of objects of sub-classes of class_source, e.g., in- 
stances of class_source_let or of class_source_apply, . . . ), not into s-expressions (i.e. objects of 
class_sexpr, as provided by the reader). 

Field names should be globally unique: this enables (get_f ield : named _name x) to be safely trans- 
lated into something like "if x is an instance of class_named fetch its :named_name field otherwise give 
nil", since MELT knows that named_name is a field of class_named. 

As in C, there is only one name-space in MELT which is technically, like Scheme, a Lispi dialecj^j 
(in Queinnec's terminology [22]). This prompts a few naming conventions: most exported names of a 
module share a common prefix; most field names of a given class share the same prefix unique to the 
class, etc. 

The entire MELT translation process [26] is implemented through many exported definitions which 
can be used by expert MELT users to customize the MELT language to suit their needs. Language con- 
structs]^] give total access to environments (instances of class_environment). 

Hooks for changing GCC's behavior are provided on top of the existing GCC plugin hooks (for in- 
stance, as exported primitives like install_melt_gcc_pass which installs a MELT instance describing a 
GCC pass and registers it inside GCC). 

A fairly extensive MELT standard library is available (and is used by the MELT translator), providing 
many common facilities (map-reduce operations; debug output methods; run-time asserts printing the 
MELT call stack on failure; translate-time conditionals emitted as #if def ; . . . ) and interfaces to GCC 
internals. Its .texi documentation is produced by a generator inside the MELT translator. 

When GCC will provide additional hooks for plugins, making them available to MELT code should 
hopefully be quite easy. 

4 Pattern matching in MELT 

Pattern matching lfl2l [141 [T8l l30l is an essential operation in symbolic processing and formal handling 
of programs, and is one of the buying features of high-level programming languages (notably Ocaml 
and Haskell). Several tasks inside GCC are mostly pattern matching (like simplification and folding of 
constant expressions^] Code using MELT pattern matching facilities is much more concise than its 
(generated or even hand-written) C equivalent. 

4.1 Using patterns in MELT 

Developers using MELT often need to filter complex GCC stuff (in particular gimple or tree-s) in their 
GCC passes coded in MELT. This is best achieved with pattern matching. The matching may fail (if the 
data failed to pass the filter) or may extract information from the matched data. 

39 Each bound name is bound only once, and there are no separate namespaces like in C or Common Lisp. 
4( *Like (current_module_environment_container) and (parent jnod_le_environi_eiit) , etc. 

41 Strangely, GCC has several specialized code generators, but none for pattern matching: so the file gcc/f old-const . c is 
hand-written (16KLOC). 
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4.1.1 About pattern matching 

Patterns are major syntactic constructs (like expressions and let-bindings in Scheme or MELT). In MELT, 
a pattern starts with a question mark, which is parsed particularly: ?x is the same as (question x) [it 
is the pattern variable x]. ?_ is[^]the wildcard pattern (matching anything). An expression occurring in 
pattern context is a constant pattern. Patterns may be nested (in composite patterns) and occur in match 
expressions. 

Elementary patterns are ultimately translated into code that tests that the matched thing pt can be 
filtered by the pattern % followed by code which extracts appropriate data from \i and fills some locals 
with information extracted from }i. Composite patterns need to be translated and optimized to avoid, 
when possible, repetitive tests or fills. 



4.1.2 An example of pattern usage in gcc melt 

Many tasks depend upon the form of [some intermediate internal representation of] user source code, 
and require extracting some of its sub-components. For instance, the author has written (in a single day) 
a GCC extension in M ELT to check simple coding rules in melt-runtime . c, (e.g., in function of figure 2b. 
When enabled with -fplugin-melt-arg-mode=meltf rame, it adds a new pass (after the "ssa" pasf^j 
of GCC EH) melt_f rame.pass to GCC. This pass first finds the declaration of the local meltf ram__ in the 
following pass execute function: 

1 (defun meltf rame_exec (pass) 

2 (let ( 

3 (:tree tfundecl (cfun_decl)) ( : long nbvarptr 0) 

4 (:tree tmeltf ramdecl (null_tree)) (:tree tmeltf ramtype (null_tree)) ) 

5 (each_local_decl_cf un () (:tree tlocdecl :long ix) 

6 (match tlocdecl 

7 ( ? (tree_var_decl 

8 ?(and ?tvtyp ?(tree_record_type_with_f ields ?tmeltf ramrecnam ?tmeltf ramf ields) ) 

9 ? (cstring_same "meltfram ") ?_) 

10 (setq tmeltf ramdecl tlocdecl) (setq tmeltf ramtype tvtyp) 

11 (f oreach_f ield_in_record_type (tmeltf ramf ields) (:tree tcurfield) 

12 (match tcurfield 

13 ( ?(tree_f ield_decl 

14 ? (tree_identif ier ?(cstring_same "mcf r_varptr") ) 

15 ? (tree_array_type ?telemtype 

16 ?(tree_integer_type_bounded ?tindextype 

17 ? (tree_integer_cst 0) 

18 ?(tree_integer_cst ?lmax) 

19 ?tsize))) 

20 (setq tmeltf ramvarptr tcurfield) (setq nbvarptr lmax))))) 

21 ( ?_ (void)))) 



The let line 2 spans the entire MELT function meltf rame_exec, with bindings lines 3 & 4 for 
tfundecl, nbvarptr, tmeltf ramdecl & tmeltf ramtype locals. The each_local_decl_cf un is a C- 
iterator (iterating -lines 5 to 1 1- on the Trees representing the local declarations in the function). The 
match expression filters the current local declaration tlocdecl (lines 7-11). When it is a variable 
declaration (line 7) whose type matches the sub-pattern line 8 and whose name (line 9) is exactly 
meltfram__, we assign (line 10) appropriately tmeltf ramdecl & tmeltf ramtype, and we iterate 
(line 1 1) on its fields to find, by the match (lines 12-21), the declaration of field mcf r_varptr (in the 

42 ?_ can be pronounced as "joker" 

43 ssa means Static Single Assignment, so at that stage the code is represented in Gimple/SSA form, so each SSA variable is 
assigned once! 
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C code), and its array index upper bound lmax, assigning them (line 20) to locals tmeltf ramvarptr & 
nbvarptr. Otherwise, using the wildcard pattern ?_, we give a : void result for the match of tlocdecl 
(line 21). 

Once the declaration of meltf ram__ and of its mcfr_varptr field has been found^jin the current 
function (given by cfun inside GCC), we iterate on each basic block bb of that function, and on each 
gimple statement g of that basic block, and we match that statement g to find assignments to or from 
meltf ram__. mcf r_varptr [fc] where K is some constant integer index: 



22 (each_bb_cf un () ( :basic_block bb :tree funded) 

23 (eachgimple_in_basicblock (bb) 

24 (: gimple g) 

25 (match g 

26 ( ? (gimple_assign_single 

27 ? (tree_array_ref ? (tree_component_ref tmeltf ramdecl tmeltf ramvarptr) 

28 ? (tree_integer_cst ?idst)) 

29 ? (tree_array_ref ? (tree_component_ref tmeltf ramdecl tmeltf ramvarptr) 

30 ?(tree_integer_cst ?isrc))) 

31 [handle assign "meltf ram .mcfr_varptr [idsi] = meltfram .mcf r_varptr [isrc] ; "J ) 

32 ( ? (gimple_assign_single 

33 ? (tree_array_ref ? (tree_component_ref tmeltf ramdecl tmeltf ramvarptr) 

34 ? (tree_integer_cst ?idst)) 

35 ?rhs) 

36 [handle assign "meltf ram .mcfr_varptr [idsi] = rhs ;" J) 

37 ( ? (gimple_assign_single ?lhs 

38 ? (tree_array_ref ? (tree_component_ref tmeltf ramdecl tmeltf ramvarptr) 

39 ? (tree_integer_cst ?isrc))) 

40 [handle assign " Ihs = meltfram .mcf r_varptr [isrc] ; "J ) 



The gimple g is matched against the most filtering pattern (lines 26-30, for assignments like 
"meltf ram__. mcf r_varptr [idst ] = meltf ram__. mcf r_varptr [isrc] ;") first, then against the more 
general patterns -for "meltf ram__. mcf r_varptr lidstl = rhs ;" where rhs is any simple operand- 
lines 32-36, and for "ihs = meltf ram__. mcf r_varptr [isrc] ;" lines 37-40. The MELT programmer 
should order his matching clauses from the more specific to the more general. 
Other code (not shown here) in function meltf rame_exec remembers all left-hand side and right-hand 
side occurences of meltf ram__. mcf r_varptr [fc] , and issues a warning when such a slot is not used. 

We see that a match is made of several match-cases, tested in sequence until a match is found. Each 
case starts with a pattern, followed by sub-expressions which are computed with the pattern variables 
of the case set appropriately by the matching of the pattern; the last such sub-expression is the result of 
the entire match. Like other conditional forms in MELT, match expressions can give any thing (stuff, 
e.g., : long ... or even : void, or value) as their result. Patterns may be nested like the tree_var_decl or 
tree_record_type above. All the locals for pattern variables in a given match-case are cleared (before 
testing the pattern). It is good style to end a match with a catch-all wildcard ?_ pattern. 

A pattern is usually composite (with nested sub-patterns) and has a double role: first, it should test 
if the matched thing fits; second, when it does, it should extract things and transmit them to eventual 
sub-patterns; this is the fill of the pattern. The matching of a pattern should conventionally be without 
side-effects (other than the fill, i.e. the assignment of pattern variables). 

Patterns may be non-linear: in a matching case, the same pattern variable can occur more than once; 
then it is set at its first occurrence, and tested for identity^] with == in the generated C code on all 

44 A warning is issued if meltf ram__ or mcf r_varptr has not been found. 

45 We don't test for equality of values or other things, knowing that A -term equality is undecidable, and acknowledging that 
deep equality compare of ASTs like tree or gimple is too expensive. 
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the following occurrences. This is useful in patterns like ?(gimple_assign_single ?var ?var) to find 
assignments of a variable var to itself. 



4.2 Pattern syntax overview 

A pattern % may match some matched thing pi, or may fail. It the matching succeeds, sub-patterns may 
be matched, and pattern variables may become bound. The thing bound by some pattern variable is 
checked in following occurrences of the same pattern variables and is available inside the match-clause 
body. 

Patterns may be one of: 

• expressions £ (e.g., constant literals) are (degenerated) patterns. They match the matched data pt iff £ 
== jJ. (for the C sense of equality, which for pointers is their identity). 

• The wildcard noted ?_ matches everything (every value or stuff) and never fails. 

• a pattern variable ?V matches pt if it was unset (by a previous [sub-]matching of the same ?v). In 
addition, it is then bound to pt. If the pattern variable was previously set, it is tested for identity 
(with equality in the C sense). 

• most patterns are matcher patterns ? (m £1 ... e„ %\ ... 7l p ) where the n> expressions £,• 
are input parameters to the matcher m and the Ttj sub-patterns are passed extracted data. The 
matcher is either a c-matcher (declaring how to translate that pattern to C code) or it is a fun- 
matcher (matching is done by a MELT function returning secondary things). 

• instance patterns are like ? (instance K :<I>i 7l\ ... :<t> n % n ) ; the matched \l is an object of [a 
sub-] class K whose field <!>, matches sub-pattern 7T,. 

• conjunctive patterns are ? (and 7i\ ... 7i n ) and they match pt iff every 71, in sequence matches fx; 
notice that when some 71; is a pattern variable ?v that variable is matched and pt should match the 
further %j (with j > i) with v appropriately bound to pt. (This generalizes the as keyword inside 
Ocaml patterns). 

• disjunctive patterns are ? (or %\ ... % n ) and they match \i if one of the 7C ; - matches pt. 



4.3 C-matchers and fun-matchers 

The c-matchers are one of the building blocks of patterns - much like primitives are one of the build- 
ing blocks of expressions. Like primitives, c-matchers are defined as a specialized C code generation 



template. In the example above (£4. 1.2 1, most composite patterns involve c-matchers: tree_var_decl, 

tree_record_type and cstring_same are C-matchers. 

Like for every pattern, a C-matcher defines how the pattern using it should perform its test, and then 
how it should do its fill. A simple example of a C-matcher is cstring_same: some : cstring stuff G 
matches the pattern ?(cstring_same "fprintf ") iff a is the same as the const char* string "fprintf " 
given as input to our c-matcher. This c-matcher has a test part, but no fill part (because used without 
sub-patterns). 

(def cmatcher cstring_same (: cstring str cstr) () strsam 

:doc #{The $CSTRINGSAME c-matcher matches a string $STR iff it equals the constant string $CSTR. 

The match fails if $STR is null or different from $CSTR. }# 
#{ /*$STRSAM test*/ ($STR != (const char*)0 kk $CSTR != (const char*)0 kk ! strcmp($STR, $CSTR) ) }# ) 

Notice that the state symbol strsam is used inside a comment, to uniquely identify each occurrence in 
the generated C, and that we take care of testing against null const char* pointers to avoid crashes. 

A more complex (and GCC specific) example is the gimple_assign_single c-matcher (to filter single 
assignments in compiled code). It defines both a testing and a filling expansion using two macro- strings: 
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(def cmatcher gimple_assign_single 

(:gimple ga) (:tree lhs rhs) gimpassi 

#{ /*$GIMPASSI test*/($GA && gimple_assign_single_p ($GA) ) }# 

#{ /*$GIMPASSI fill*/ $LHS = gimple_assign_lhs ($GA) ; $RHS = gimple_assign_rhsl ($GA) ; }# ) 

Here ga is the matched gimple, and lhs & rhs are the output formals: they are assigned in the fill 
expansion to transmit tree-s to sub-patterns ! 

C-matchers are a bit like Wadler's notion of Views |[30l . but are expanded into C code. MELT also 
has fun-matchers which similarly are views defined by a MELT function returning a non-nil value if the 
test succeeded with several secondary results giving the extracted things to sub-patterns. For example 
the following code defines a fun-matcher isbiggereverj^jsuch that the pattern ?(isbiggereven pi n) 
matches a :long stuff a iff a is a even number, greater than the number ju, and a/2 matches the sub- 
pattern n. We define an auxiliary function matchbiggereven to do the matching [we could have used 
a lambda]. If the match succeeds, it returns a true (i.e. non nil) value (here fmat) and the integer to be 
matched with 71. Its first actual argument is the fun-matcher isbiggereven itself. The testing behavior of 
the matching function is its first result (nil or not), and the fill behavior is through the secondary results. 

(defun matchbiggereven (fmat : long s m) 

; fmat is the funmatcher, s is the matched a , m is the minimal 
(if (==i C/.iraw s 2) 0) 

(if (>i s m) (return fmat (/iraw m 2))))) 
(def unmatcher isbiggereven (:long s m) (:long o) matchbiggereven) 

The fun-matcher definition has an input formals list and an output formal list, together defining the 
expected usage of the fun-matcher operator in patterns. 

Both c-matchers and fun-matchers can also define what they mean in expression context (not in 
pattern one). So the same name can be used for constructing expressions and for destructuring patterns. 

4.4 Implementing patterns in MELT 

Designing and implementing patterns in MELT was quite difficult, because a good translation of pattern 
matching should : 

• factorize, when possible, common sub-patterns, to avoid testing twice the same thing. 

• share, when appropriate, data extracted from subpatterns. 

• preferably re-use the many temporary locals used by the translation of the match, to lower the 
current MELT stack frame size. 

Our first implementation of pattern translation to C is quite naive, and uses simple memoization 
techniques to factorize sub-patterns or share extracted data. 

A better implementation of the pattern translator builds explicitly a directed graph (with shared nodes 
for tests and data), like figure [4j The graph has data nodes (for temporary variables for [sub-]matched 
things, or for boolean flags internal to the match) and elementary control steps. These steps are either 
tests (with both a "then" and an "else" jumps to other steps) or computations (usually with a single jump 
to a successor step). Some steps just set an internal boolean flag, or compute the conjunction of other 
flags. Other steps represent the testing or the filling parts of c-matchers or fun-matchers. Final success 
steps correspond to sub-expressions in the body of the matched clause and are executed if a flag is set. 



Our isbiggereven could also be defined as a c-matcher! 
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For instance a simple match (where v is the matched value) like below is translated into the complex 
internal graph |^] given in figure [i] 

(match v 

( ? (instance class_symbol :named_name ?synam) 
(f synam) ) 

( ? (instance class_container : container_value ?(and ?cval ? (integerbox_of ?_))) 
(g cval))) 



A more complex match like (match tcurf ield ...) of £4.1.2 code line 12-20 produces about 20 match 
steps and 12 match data. This enhanced pattern matching is not entirely implemented at time of writing: 
the generation of the control graph for the match is implemented, but its translation into C is incomplete. 



5 Conclusions and future work 

Enhancing a legacy huge software with a domain specific language or scripting language is always a 
major challenge (^TJ, since incorporating a DSL inside a software is a major architectural design de- 
cision which should be taken early. Mature big software like GCC have their coding habits, memory 
management strategies and data organization which makes it very difficult to embed an existing scripting 
language (like Python, Ocaml, Ruby, ...). 

We have shown that adding a high-level DSL to a big software like GCC is still possible, by designing 
a run-time system ^compatible with the existing infrastructure (notably Gg-c) and most importantly, by 
having the DSL deal both with boxed values and raw existing stuff in §3.2| Translating the DSL to the 
language (with its habits) used in that big software (C for GCC) enables high-level language constructs 



in our DSL. We have described a set of language constructs in { 3.4 (c-matchers, primitives, c-iterators, 
. . . ) which give templates for C code generation. 

Our empirical approach of designing and implementing a DSL like MELT to fit into a large software 
like GCC, could probably be re-used for adding DSLs inside other huge mature software projects: de- 
signing a runtime suitable for such a project, having several sorts of things (values and stuff), generating 
code in the style of the existing legacy, and defining adequate language constructs giving code-generating 
templates. 

Future work within MELT is mostly using this DSL to build interesting GCC extensions. P. Vittet has 
started in May 201 1 a Google Summer of Code project to add specific warnings into GCC using MELT. A. 
Lissy considers using it for Linux kernel [13] code analysis. The opengpu mode should be completed. 
Also, some language features can be added or improved: 

1. variadic functions, possibly provided by a :rest keyword similar to Common Lisp's fcrest. These 
should be very useful for debugging and tracing messages. 

2. adding backtracking or iterating pattern constructs; for instance to be able to have a pattern for any 
: gimple_seq stuff containing at least one gimple matching a given sub-pattern. 

3. adding a nice usable and hygenic macro system, inspired by Scheme's def syntax 

4. performance improvements might be achieved by sometimes translating MELT function calls into 
a C function call whose signature mimicks the MELT function signature. 



47 To debug the pattern-match translator, MELT is generating a graph to be displayed with GraphViz. We have edited it (by 
removing details like source code location) for clarity. 
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5. a message caching machinery, where every MELT message passing occurrence would use a cache 
(keeping the last class of the sending). 

6. a central monitor, which would communicate with parallel GCC melt compilations through asyn- 
chronous textual protocols. 

More generally, making MELT more high-level and more declarative (in J.Pitrat's |[T9ll20l sense) to 
be able to express GCC passes easily and concisely is an interesting challenge, and could be transposed 
to other legacy software. 
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